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ABSTRACT 




Data mining is the process of analyzing data from different perspectives and summarizing it 
into useful information that can be used to increase revenue, cut costs, or both. Data mining software 
is one of a number of analytical tools for analyzing data. It allows users to analyze data from many 
different dimensions or angles, categorize it, and summarize the relationships identified. Technically, 
data mining is the process of finding correlations or patterns among dozens of fields in large 
relational databases. Data Mining in Java is a big challenging problem nowadays, When an 
application opens which consist of large data in database it takes so much time to load and get that 
data however it may contain large number of bugs. So the problem with " Big Data Mining " is still a 
issue. We have to rectify this issue with effective approach with decision tree classifier in which we 
need clustering of data with k-means error and bug search of that particular source code of 
application. We will enhance tlie search on Bug Detection the K-means clustering algorithm with the 
help of multi-threading Decision Tree. In this work of research the problem with classification of bugs 
is been identified. 
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Data mining techniques have increasingly been studied[7] especially in their application in real-world 
databases. One typical problem is that databases tend to be very large, and these techniques often repeatedly 
scan the entire set. Sampling has been used for a long time, but subtle di_erences among sets of objects become 
less evident.This work provides an overview of some important data mining techniques and their applicability 
on large databases. 

Objectives 

[1] To maintain the sustainability of the code of object oriented languages. 

[2] To clearly help the testers to classify the number of bugs present in a source code. 

[3] Increasing the response/execution time of proposed algorithm using decision tree. 

[4] To attain the accuracy which helps in future growth of testing phase of various IT applications and smart 
phone applications. 

Different levels of analysis are available: 

• Artificial neural networks: Non-linear predictive models that learn through training and resemble 
biological neural networks in structure. 

• Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, 
and natural selection in a design based on the concepts of natural evolution. 

• Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for 
the classification of a dataset. Specific decision tree methods include Classification and Regression Trees 
(CART) and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree 
techniques used for classification of a dataset. They provide a set of rules that you can apply to a new 
(unclassified) dataset to predict which records will have a given outcome. CART segments a dataset by 
creating 2-way splits while CHAID segments using chi square tests to create multi-way splits. CART 
typically requires less data preparation than CHAID. 

• Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of 
the classes of the fcrecord(s) most similar to it in a historical dataset (where k 1). Sometimes called the k- 
nearest neighbor technique. 

• Rule induction: The extraction of useful if-then rules from data based on statistical significance. 
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• Data visualization: The visual interpretation of complex relationships in multidimensional data. Graphics 
tools are used to illustrate data relationships. 

C Programming Error types: While writing c programs, errors also known as bugs in the world of 
programming may occur unwillingly which may prevent the program to compile and run correctly as per the 
expectation of the programmer.Basically there are three types of errors in c programming: 
[1] Runtime Errors 
[2] Compile Errors 
[3] Logical Errors 

C Runtime Errors: C runtime errors are those errors that occur during the execution of a c program and 
generally occur due to some illegal operation performed in the program.Examples of some illegal 
operations that may produce runtime errors are: 

• Dividing a number by zero 

• Trying to open a file which is not created 

• Lack of free memory space 

It should be noted that occurrence of these errors may stop program execution, thus to encounter this, a 
program should be written such that it is able to handle such unexpected errors and rather than terminating 
unexpectedly, it should be able to continue operating. This ability of the program is known as robustness and 
the code used to make a program robust is known as guard code as it guards program from terminating abruptly 
due to occurrence of execution errors. 

Compile Errors: Compile errors are those errors that occur at the time of compilation of the program. C 
compile errors may be further classified as: 

Syntax Errors: When the rules of the c programming language are not followed, the compiler will show syntax 
errors. 

For example, consider the statement, 
1 int a,b: 



The above statement will produce syntax error as the statement is terminated with : rather than ; 

Semantic Errors: Semantic errors are reported by the compiler when the statements written in the c program are 
not meaningful to the compiler. 

For example, consider the statement, 

1 b+c=a; 



In the above statement we are trying to assign value of a in the value obtained by summation of b and c which 
has no meaning in c. The correct statement will be 

1 a=b+c; 

Logical Errors: Logical errors are the errors in the output of the program. The presence of logical errors 
leads to undesired or incorrect output and are caused due to error in the logic applied in the program to 
produce the desired output. 

Entries in tab pan:Ther are some error category. 



www.ijceronline.com 



Open Access Journal 



Page 20 



Enhanced Bug Detection By. 





Category \ Categories. \ Error \ Errors \' K-Means Bug Detection \ About Us | 


id 


Description 






Syntax error 


0.5 




undefined variable 


0.3 




class not found 


0.2 




unused va ri a b 1 e 


o.-i 




State nent missing : 


0.05 




co n pound state nent rMissing '■■ 


0.09 




undefined symbol 


0.15 




function call missing : 


0.07 




unterm inateo! string 


0.06 


10 


■.va rn i n g d ivi deb y ze ro 


o.-i 


1 1 


Decla rati o n te rm i n ate d i n co rre ctl > 


0.4 


12 


Function should have prototype 


0.5 


13 


Type mismatch in parameter 


0.05 


14 


Cannot convert datatype to const 


0.35 


15 


Forgetting to put a sreak in a switch statement 


0.25 


16 


Using ~ instead of == 


0.1 


IT 


Forgetting to pLit an ampersand on an arqumsnt 


0.3 


18 


Using the wrong format for operand 


0.4 


1 9 


Size of arrarys 


0.1 


20 




0.2 




Not initialising pointers 


0.5 


Z2 


Confusing character anc string constants 


0.75 


13 


Comparing strings .vith = = 


0.45 


14 


Not null ter ninating strings 


0.2 


?5 


Not leaving room for the nLill terminator 


0.5 


26 


U s i n g f g etc: : . etc. i n co rre ctl ■.■ 


0.8 


IT 


Using feofQ incorrectly 


0.2 


38 


Leaving characters in the input s Lifter 


0.5 


29 


Using the getsf} functi on 


0.4 


30 


vaiasle name styles 


0.1 


11 


Overstepping array so Lindanes 


0.3 


32 




0.5 


33 


Spatial memory error 


0.4 



There are some snap shots: 

(a) Error Tabbed: First of all we enter name of category and severity and save this for next 
process. 




Fig 1.1 



(b) Errors Tabbed: In this tab we enter error category and description of error for example semi colon 
is missin,error in line 2etc. 



Tabbed E 


>ane AppUi 


at 








C ckl egory [ 


Categories [ Error |f Errors | 






id 




c id 


Information 




1 


1 




example syntex error 




2 






example claas not found 


3 


l 




example syntex error 




A 


4 




example unused ve ri a b 1 e 


5 


2 




example undefined variable 


S 


2 




example Lin defined variable 




4 




example unused veriable 


8 






example <:laas not found 


9 


1 




tg 




10 


4 




fq 


11 


1 




sd 


12 

13 


1 

1 




m 










ihhjhii 



igl.2 
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Bug Detection: Firstly we show the snapshot which represents the Bug detection by using 
K-means clustering. In this snap shot we detect bug by using bug detection tab and write our 
program in C Language in code window after writing our code we will use detect button then 
it will show error if is there any error in our program in the output window . In the code 
window it will show output or detect four errors and display total number of errors. 



C«m«ry Cmwrits Error Error* | K-MboWf | Bug [towctwi Adorn Uv 




d [first use In this functl 



■i |uni-|,?lirie,-| I I.. -. I e.- 



I Fig 1.3 



Now In this code window K-means cluster are made by above code values. 



Category Categories Error Errors K-Means ' Bug Detection ' About Us 



Tabbed Pane Application 



[702,0.3). 

[71,6,0.09), 
(72.5.0.05;. 
[73,2,0.3), 
[74.2.0.3). 
1.75.5.0.05:. 
[76,6,0.09) 

ClusteM i This dust 

[1, 1,0.5), 

1.2.3.0.2:. 

.2 1 0.5;. 

.4.4.0.1;. 

.6.2.0.3;. 

2.0.3:. 
.7.4.0.1:. 

i.S 3 0.2; 

0. 4.0.1). 

(10.1.0.5V 



0 F » 



Fig 1.4 



Now In this code window it will disply predicting the error probability using decision tree & 
K-Means result 
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ml 



f Cat&gor y Catego ries \' Error \ Errors K Means f De cision Tree j Detect About Us | 



Tabbed Pane Application 



v'oid msaiO { 
inti = 0; 



prinffr^cT, r 
gelchC); 



JJ 

1)Analyzing error 

test.c:8:2: error: expected ">' before "getch* 

BUG DETECTED 

Ci Error classified successfully 
(*) Error category: B 

(*} Error category description: [function call missing >] 
2>Analyzing error 

test.c:1 0:1: error: expected ';' before "}' token 



Fig 1.5 



How the K-Mean Clustering algorithm works: 



( Start") 

HI 



Number of 
cluster K 



7 



J 



Centra d 



Distance objects to 
centroids 



Grouping based on 
minimum distance 




<r4 
pve grourj. 



Fig 1.6 

K-means Clustering: 

Complexity isO(n*K*I*d) 

- n = number of points, K = number of clusters, 

I = number of iterations, d = number of attributes 

- Easily parallelized 

- Use kd-trees or other efficient spatial data structures for some situations 
Pelleg and Moore (X -means) 

Sensitivity to initial conditions 

Limitations of K-means: 
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• K-means has problems when clusters are of differing 
o Sizes 

o Densities 

o Non-globular shapes 

• Problems with outliers 

• Empty clusters 

Limitations of K-means: Differing Density 
Original Points: 




-2-1 0 1 2 3 4 5 6 

X 



K-means (3 Clusters): 




-2-10123456 

* Fig 1.7 

Limitations of K-means: Non-globular Shapes: 
Original Points: 




-15 -10 -5 0 5 10 15 

X 



K-means (2 Clusters): 
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Overcoming K-means Limitations: 
Original Points: 



Figl.8 



K-means Clusters: 



Clustering Analysis 
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Fig 1.9 



Clustering is the division of data into groups containing similar objects. It is used in fields such as pattern 
recognition, and machine learning [2]. Searching for clusters involves unsupervised learning. 
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Result 


' Category \ Categories [ Error [' Errors [' K-Means Decision Tree [' Detect [' About Us | 


| Detect | 


void main[){ 
int i=0; 




ntr: 




printrr^iT, r), 
getch[t; 




} 




<l 


Building C source code file... 




1}Analyzing error 

collect^: error: Id returned 1 exit status 
(*) Error classified successfully 

Error category: 48 
(*t Error category description: [linker error] 





Fig 1.10 



[' Category")' Categories ["' Error [ Errors \ H-H6ans~[ Bug Detection | About Us | 

pDetectl | 

void maini: ' 



fg-5: 

for(t=0;M{ 
fg=fg't 



Predicting the error probability using 
DecisonTree 6 kmaans results 



Error testc:5:3: error fg undeclared (first use in this function) 
Perdlction: 10.526316 

Error: test.c:6:7: error: t undeclared (first use in this function) 
Perdlction: 10.526316 

Error: 1estc:6:15: error: expected':" before ytoken 
Perdiction:7.B947363 

Error 1= st.c:s: 1: error: eipeded declaration or statement at end of input 
F^i'-iclioiv 7 33- 7 5'i3 




fig 1.11 

Conclusion 

Clustering is one of the most essential steps in data mining. It is the process of grouping data items 
based on similarity between elements in a cluster and dissimilarities between clusters. In this paper we have 
provided an overview of the broad classification of clustering algorithms such as partitioning, hierarchical, 
density based and grid based methods. 

According to my project the bug has been detect by using c code compiler called in java ,it show multiple 
errors in program when we execute program by using bug detect tab after writing the code then by detect button 
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we will get result.lt will display number of errors,their typer of error like as syntax error,undefined variables 
etc. and they also show there category to match the type of category. 

In this bug detection code it will display cluster by using K-Means cluster algorithm and also detect Prediction 
using decision tree with K-Means. 

Future scope::For future research work Better clustering algorithm can be used. And More languages can be 
analyzed like dot net,C++,Python etc. 
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